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ABSTRACT 

This  paper  addresses  the  construction  of  a  consistent  interval  estimator  for  the  steady-state  mean  within  a 
replication/deletion  framework  for  output  analysis  when  MSER  truncation  is  applied.  Because  the  MSER 
truncation  point  is  a  random  variable,  the  truncated  output  sequences  for  each  replication  typically  are  un¬ 
equal  in  length.  A  weighting  scheme  is  applied  to  the  replication  means  to  correct  for  unequal  sample 
sizes,  as  is  standard  in  ANOVA.  A  numerical  example  is  provided  to  illustrate  the  procedure  and  conse¬ 
quences. 

1  INTRODUCTION 

Over  a  half-century  ago,  Conway  recognized  the  initial  transient  as  the  first  among  three  principle  tactical 
problems  in  steady-state  simulation  (Conway  1959;  Conway,  et  al.  1963;  Goldsman,  et  al.  2010).  An  ar¬ 
bitrary  selection  of  initial  conditions  for  simulation  runs  introduces  bias  in  the  estimation  of  output  statis¬ 
tics,  such  as  the  stationary  mean.  The  most  common  approach  for  mitigating  such  bias  is  to  truncate  or 
delete  some  number  of  observations  from  the  beginning  of  the  output  sequence  and  to  compute  statistics 
using  only  the  remaining  observations. 

Alternate  criteria  for  determining  a  good  truncation  point — one  that  adequately  removes  the  bias 
without  undue  loss  of  precision — have  been  the  subject  of  continuing  invention  beginning  with  Conway 
himself.  Recently,  however,  a  consensus  has  emerged  among  researchers  that  MSER  has  all  of  the  prop¬ 
erties  most  desired  in  a  truncation  criterion.  It  is  effective  and  efficient  at  mitigating  bias,  robust  across 
alternate  forms  of  biasing  functions,  computationally  trivial,  easily  understood,  and  does  not  require  expe¬ 
rimenter  intervention  to  establish  parameters. 

MSER  was  initially  developed  by  McClamon  (1990),  White  and  Minnox  (1994),  and  White  (1997) 
and  was  applied  and  extended  by  Rossetti  et  al.  (1995),  Spratt(1998),  Cobb  (2000),  White,  et  al  (2000), 
and  Franklin  (2009).  Mahajan  and  Ingalls  (2004)  determined  three  truncation  criteria  adequate,  with 
MSER-5  recommended  for  its  efficiency  and  robustness.  Oh  and  Park  (2006)  compared  their  EVR  me¬ 
thod  “with  the  method  MSER-m  known  as  the  most  sensitive  rule  in  detecting  bias  and  most  consistent 
rule  in  mitigating  its  effects.”  MSER  was  shown  to  outperform  EVR  in  almost  all  experiments.  Sandikci 
and  Sabuncuogy  (2006)  automated  MSER-5  as  their  means  for  studying  transients.  Bertoli,  Casale,  and 
Serazzi  (2007,  2009)  selected  MSER-5  as  the  initialization  approach  for  their  Java  Modeling  Tools  pack¬ 
age  and  included  a  usage  wizard.  The  criterion  gained  additional  traction  with  an  exhaustive  empirical 
evaluation  by  Hoad,  et  al  (2008,  2011),  who  chose  MSER-5  as  the  most  suitable  for  automation  over  a 
wide  range  of  published  approaches  to  the  transient  problem,  including  heuristics,  graphical  procedures, 
initialization  bias  tests,  statistical  methods,  and  hybrid  approaches. 
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White  and  Franklin  (20 1 0)  confirm  the  empirical  findings  of  White  and  Robinson  (20 1 0)  regarding 
the  relationship  between  the  MSER  truncation  point  and  the  degree  of  mean  bias  and  autocorrelation  in  an 
output  sequence.  They  introduce  a  parametric  approach  to  analyzing  the  expected  behavior  of  MSER  and 
apply  this  approach  to  an  output  model  with  geometrically  decaying  bias  and  constant -parameter  AR(1) 
white  noise.  Franklin  et  al  (2009)  explore  the  intuition  that  MSER  minimizes  the  mean  squared  error 
(MSE)  of  the  mean  estimator.  This  empirical  result  is  confirmed  by  Pasupathy  and  Schmeiser  (2010). 
They  reason  that  MSE  is  the  most  appropriate  criterion  for  evaluating  alternate  truncation  criteria,  show 
that  the  MSER  statistic  is  asymptotically  proportional  to  the  MSE,  and  conclude  that  the  MSER  statistic  is 
a  solid  foundation  for  initial-transient  algorithms.  Pasupathy  and  Schmeiser  also  suggest  two  new  algo¬ 
rithms  using  the  MSER  statistic  and  compare  these  to  the  original  MSER  algorithm  using  empirical  re¬ 
sults  for  M/M/1  and  AR(1)  data  processes.  Mokashi  et  al  (2010)  compared  their  N-Skart  method  with 
MSER-5  and  achieved  only  modest  improvements  with  considerably  greater  computational  effort.  Most 
recently,  Eload  and  Robinson  (2011)  consider  the  practical  implementation  of  MSER-5. 

2  TRUNCATION  AND  THE  MSER  CRITERION 

Denote  the  output  of  a  single  replication  of  a  simulation  as  the  time  series  [y,:  i=  1,2,. .  .,77].  Truncation  di¬ 
vides  this  into  two  subseries  \{yf.  i=\  ,2,...,c/),(y,:  i=d+\,2,...,ri)\,  where  d  is  the  truncation  point.  For  an 
output  that  is  tallied,  under  truncation  the  estimator  for  the  mean  output  is  the  sample  mean  of  the  second 
(reserved)  subseries 


Y(n,d\y0) 


1 


n-d 


i=d+l 


The  MSER  criterion  for  the  optimal  truncation  point  is 

d*  =  arg  miv\MSER{n,  d  |  y0)] 

n»d>  0 

where  the  MSER  statistic  is  the  square  of  the  estimated  standard  error  of  the  mean 
MSER(n,d  |  _y0 )  =  SEy(n,cl  |  _y0 )  =  Sf{n,d\  _v0 )  /  n  obtained  using  the  large -sample  variance 


Sy  ( n,d  \y0)  =  - YjyYl  -Y(n, d  \  y0)f 
{n-d)  i=d+l 

n»d>  0 


Note  that  for  correlated  data  the  sample  variance  is  a  biased  estimator.  For  a  covariance-stationary 
process  the  actual  squared  standard  error  is 


2  n- 1  r 

se2y(x)  =  —  1  +  2Y  1 

n  L  tr \ 


\Pi 


nj 


where  <f  is  the  lag-zero  autocovariance  and  pk  is  the  lag  k  -lag  autocorrelation.  Stationarity  requires  that 
the  bracketed  term  is  finite  as  77— »oo.  In  practical  terms  this  means  that  for  sufficiently  large  n  the  brack¬ 
eted  term  becomes  de  facto  a  constant.  Therefore  we  can  consider  SE~  =  ca~  / n  and  estimate  it  as 

SEy  =  cS y  /  n  .  Since  the  MSER  truncation  point  is  based  on  minimizing  SE  j  (rather  than  estimating  it), 
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we  can  effectively  ignore  the  constant  and  determine  a  suitable  MSER  truncation  point  based  on  just  the 
familiar  variance  estimator.  This  explains  why  Franklin  and  White  (2008)  found  that,  as  expected,  the 
Phillips-Perron  variance  estimator  performed  no  better  in  practice  than  the  naive  variance  estimator. 

3  APPLYING  MSER  WITHIN  A  REPLICATION/DELETION  FRAMEWORK 

Denote  the  output  of  m  independent  replications  of  a  simulation  as  the  set  of  m  time  series  \yf 
i=\, 2, y'=l, Without  loss  of  generality,  consider  that  each  of  these  series  has  the  same  initial 
condition  y0  and  the  same  run  length  nt=n  V/.  As  before,  truncation  divides  each  time  series  into  two  sub¬ 
series  \{yif.  i=l ,2,. ,.,dj),(y<-  i=dj+\ ,2,..., n;  j=\ ,. where  dj  is  the  truncation  point  for  they*  series.  For 
an  output  that  is  tallied,  under  replication/deletion  we  obtain  a  random  sample  of  m  values  for  the  mean, 
each  estimated  from  one  of  the  reserved  subseries  as  the  corresponding  sample  mean 

—  |  « 

Yj{n,dj  |  v0)  =  — —  = 

n  “  /  i=dj+ 1 

Note,  however,  that  the  MSER  truncation  point  for  the  /h  replication  is  the  integer  random  variable 
D*.  This  means  that  attempts  to  create  interval  estimators  using  run-based  replication  (Conway’s  second 
principle  problem)  are  biased  if  constructed  from  independent  point  estimates  based  on  different  sample 
sizes.  Two  solutions  present  themselves  immediately:  (1)  find  and  apply  the  maximum  truncation 
amount  to  all  runs  to  reduce  them  to  a  common  size;  or  (2)  use  weighted  estimators  for  both  the  mean  and 
variance.  In  this  paper  we  will  investigate  the  second  option,  which  is  standard  in  ANOVA  and  preserves 
as  much  usable  data  as  possible. 

Denote  the  total  number  of  observations  reserved  across  all  runs  as  N  =  ^  [n  —  dj  ] .  We  will  re¬ 
gard  the  Yj  ’s  as  having  a  common  expected  value  and  underlying  variance,  but  different  standard  errors 
because  of  the  different  sample  sizes.  We  adjust  for  this  using  a  weighted  average 


_  m 

j=i 

m  — 

If  W  =1,  i.e.,  we  have  a  convex  combination,  then  Y  is  an  unbiased  estimator  of  the  common 

J= 1  J 

mean.  A  well-known  result  is  that  variance  of  Y  is  minimized  when  Wj=(n-dj*)/N.  Finally,  in  order  to  ob¬ 
tain  an  interval  estimator  we  need  an  estimator  for  the  variance  of  Y 

v«rd)=  J. 

which  is  an  unbiased  estimator  with  tail  —  ^  _  w"  J  degrees  of  freedom.  Note  that  if  the  weights  are  all 

equal  {wj=\hn  V/)  this  reduces  to  the  familiar  sample  variance  formula  with  m- 1  degrees  of  freedom. 
Note  also  that  for  unequal  weighting  the  degrees  of  freedom  in  general  will  not  be  integer  and  the  corres¬ 
ponding  t-value  will  need  to  be  generated  with  software. 
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4  AN  EXAMPLE 

We  ran  a  simulation  of  an  M/M/1  queueing  system  at  traffic  intensity  0.95  (arrival  rate  =  19/time  unit, 
service  rate  =  20/time  unit)  for  n= 10,000  and  n=  100,000  observations.  We  used  delay  in  queue  as  our 
performance  measure,  which  is  suitable  for  tally  statistics  as  described  in  Section  3.  The  output  from  each 
of  10  runs  was  truncated  based  on  MSER-5  and  the  10  resulting  sample  means  were  pooled  using  the 
weighting  process  described  above  to  form  a  90%  confidence  interval.  This  process  was  repeated  1,000 
times  to  create  1,000  confidence  intervals.  Nominally  we  would  expect  90%  of  such  intervals  to  cover 
the  true  answer  (//=  0.95)  obtained  from  theory.  We  created  an  indicator  variable  for  each  confidence  in¬ 
terval  to  record  whether  it  covered  or  did  not  cover  the  true  mean.  The  results  were  analyzed  using  JMP- 
9  and  are  summarized  in  Figure  1. 

Using  a  run  length  of  10,000  we  obtained  empirical  coverage  of  79%  for  a  nominal  90%  Cl.  When 
run  lengths  of  100,000  were  used,  the  empirical  coverage  improved  to  86.5%.  Since  all  Cls  were  based 
on  10  runs,  the  degrees  of  freedom  would  be  9  if  all  run  lengths  were  equal.  Actual  degrees  of  freedom 
varied,  ranging  from  9  down  to  8.68  in  our  2000  sets  of  experiments.  Using  a  conservative  8  degrees  of 
freedom  had  virtually  no  impact  on  the  coverage,  improving  it  only  from  86.5%  to  86.9%  when  run 
lengths  of  100,000  were  used.  Upon  inspection,  failure  to  achieve  nominal  coverage  seems  to  be  because 
the  confidence  intervals  are  centered  at  the  MSER  estimate  of  the  mean,  which  is  known  to  be  biased.  As 
we  observed,  the  bias  has  a  greater  impact  when  shorter  runs  are  used. 


Run  length  n  =  10,000 


Run  length  =  100,000 


Not  Covered  Covered 


Not  Covered  Covered 


Mean 

0.79 

Std  Dev 

0.407512 

Std  Err  Mean 

0.0128867 

Upper  95%  Mean 

0.815288 

Lower  95%  Mean 

0.764712 

Replications  m 

1000 

Mean 

0.865 

Std  Dev 

0.3418946 

Std  Err  Mean 

0.0108117 

Upper  95%  Mean 

0.8862162 

Lower  95%  Mean 

0.8437838 

Replications  m 

1000 

Figure  1:  Statistics  of  the  indicator  variable  recording  coverage  of  the  90%  confidence  interval  on  the 
mean  delay  in  queue  for  a  simulated  M/M/1  queue. 
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5  CONCLUSION 

In  this  paper  we  provide  a  brief  overview  of  the  MSER  truncation  criterion,  a  technical  note  on  the  con¬ 
struction  of  interval  estimates  using  replication/deletion  and  MSER  truncation,  a  weighting  scheme  that 
can  be  applied  to  this  end,  and  an  example  of  such  an  application.  Empirical  results  from  the  example 
suggest  that  the  weighting  of  unequal  samples  modestly  underestimates  the  width  of  confidence  intervals 
on  the  mean,  with  underestimation  decreasing  as  a  function  of  increasing  run  lengths. 
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